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Abstract 

Inference in Bayesian statistics involves the evaluation of marginal 
likelihood integrals. We present algebraic algorithms for computing 
such integrals exactly for discrete data of small sample size. Our meth- 
ods apply to both uniform priors and Dirichlet priors. The underlying 
statistical models are mixtures of independent distributions, or, in ge- 
ometric language, secant varieties of Segre- Veronese varieties. 

Keywords: marginal likelihood, exact integration, mixture of inde- 
pendence model, computational algebra 

1 Introduction 

Evaluation of marginal likelihood integrals is central to Bayesian statistics. It 
is generally assumed that these integrals cannot be evaluated exactly, except 
in trivial cases, and a wide range of numerical techniques (e.g. MCMC) have 
been developed to obtain asymptotics and numerical approximations |JJ . The 
aim of this paper is to show that exact integration is more feasible than is 
surmised in the literature. We examine marginal likelihood integrals for a 
class of mixture models for discrete data. Bayesian inference for these mod- 
els arises in many contexts, including machine learning and computational 
biology. Recent work in these fields has made a connection to singularities in 
algebraic geometry O [9], HH [151 [16] . Our study augments these developments 
by providing tools for symbolic integration when the sample size is small. 

The numerical value of the integral we have in mind is a rational number, 
and exact evaluation means computing that rational number rather than a 
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floating point approximation. For a first example consider the integral 

/ n (vrA^Af + rpf)pf )^-rfvrrfrrfArfp, (1) 

i,iG{A,C,G,T} 

where is the 13- dimensional polytope Ai x A3 x A3 x A3 x A3. The factors 
are probability simplices, i.e. 



Ai 
A3 
A3 



{(7r,r) GM|o:7r + r = 



1}, 

(fc) 



1}, k = l,2, 
1}, k = l,2. 



and we integrate with respect to Lebesgue probability measure on B. If we 
take the exponents Uij to be the entries of the particular contingency table 



U 



/4 2 2 2\ 

2 4 2 2 

2 2 4 2 

\2 2 2 4/ 



(2) 



then the exact value of the integral ([T]) is the rational number 

571 ■ 773426813 ■ 17682039596993 ■ 625015426432626533 
231 . 320 . 512 . 711 . 118 . 137 . 175 . 195 . 235 . 293 • 313 . 373 . 413 . 432 ■ 

The particular table ([2]) is taken from [121 Example 1.3], where the integrand 



(3) 



n (-^?'^f 



(4) 



i,je{A,C,G,T} 



was studied using the EM algorithm, and the problem of validating its global 
maximum over B was raised. See P, §4.2] and [13l §3] for further discussions. 
That optimization problem, which was widely known as the 100 Swiss Francs 
problem, has in the meantime been solved by Gao, Jiang and Zhu [8]. 

The main difficulty in performing computations such as ([1]) = ([3]) lies in 
the fact that the expansion of the integrand has many terms. A first naive 
upper bound on the number of monomials in the expansion of (jl]) would be 



i,ie{A,C,G,T} 



3^^ ■ 5^ 



332,150,625. 
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However, the true number of monomials is only 3, 892, 097, and we obtain the 
rational number by summing the values of the corresponding integrals 



f 7r'^ir«2(A«)«(A(2))-(p«)-(p(2))-rfvrrfrrfAdp 



ai!a2! 3! H.^J 3! Ui^i^- 3! Ui^jl 3! H^x,! 
(ai+a2+l)! ■ + ■ (E.^. + 3)! ' (E.^. + 3)! ' {E^x^ + ^V■' 



The geometric idea behind our approach is that the Newton polytope of (jlj) 
is a zonotope and we are summing over the lattice points in that zonotope. 
Definitions for these geometric objects are given in Section 3. 

This paper is organized as follows. In Section 2 we describe the class 
of algebraic statistical models to which our method applies, and we specify 
the problem. In Section 3 we examine the Newton zonotopes of mixture 
models, and we derive formulas for marginal likelihood evaluation using tools 
from geometric combinatorics. Our algorithms and their implementations are 
described in detail in Section 4. Section 5 is concerned with applications in 
Bayesian statistics. We show how Dirichlet priors can be incorporated into 
our approach, we discuss the evaluation of Bayes factors, we compare our 
setup with that of Chickering and Heckerman in [Tj, and we illustrate the 
scope of our methods by computing an integral arising from a data set in [5] . 

A preliminary draft version of the present article was published as Section 
5.2 of the Oberwolfach lecture notes [4J. We refer to that volume for further 
information on the use of computational algebra in Bayesian statistics. 

2 Independence Models and their Mixtures 

We consider a collection of discrete random variables 



where X| , . . . , XsJ are identically distributed with values in {0, 1, ... , tj}. 
The independence model Ai for these variables is a toric model §1-2] 



x^(2) ^(2) 
-^1 ) -^2 ? 
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represented by an integer d x n-matrix A with 

fc 

d = ti + t2-\ htk + k and n = + (5) 

The columns of the matrix A are indexed by elements v of the state space 

{0,1,..., ti^ X {0,1,..., X ■■■ X {0,l,...,4r'=. (6) 

The rows of the matrix A are indexed by the model parameters, which are 
the d coordinates of the points 9 = {9^^\ 9^'^\ . . . , 9^^^) in the polytope 

P = At,xAt,x---x At,, (7) 

and the model Ai is the subset of the simplex A„_i given parametrically by 

k Si 

= Prob(xf =i;f foralH,j) = HII^lw- (8) 

i=ij=i 

This is a monomial in d unknowns. The matrix A is defined by taking its 
column tty to be the exponent vector of this monomial. 

In algebraic geometry, the model Ai is known as Segre- Veronese variety 

pti X p*2 X • ■ ■ X F^" ^ P""\ (9) 

where the embedding is given by the line bundle 0{si, S2, ■ ■ ■ , Sk)- The man- 
ifold Ad is the toric variety of the polytope P. Both objects have dimension 
d — k, and they are identified with each other via the moment map [7, §4]. 

Example 2.1. Consider three binary random variables where the last two 
random variables are identically distributed. In our notation, this corresponds 
to A; = 2, si = 1, S2 = 2 and ^1=^2 = 1- We find that (i = 4, n = 8, and 
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The columns of this matrix represent the monomials in the parametrization 
(jSj). The model Ai lies in the 5-dimensional subsimplex of A7 given by 
Pool = Poio and pioi = puo, and it consists of all rank one matrices 

fpooo Pool Pioo PioA 
VPoio Poll Piio Pill / ' 

In algebraic geometry, the surface Ai is called a rational normal scroll. □ 

The matrix A has repeated columns whenever Si > 2 for some i. It 
is sometimes convenient to represent the model by the matrix A which 
is obtained from A by removing repeated columns. We label the columns 
of the matrix A by elements v = {v^^\ . . . ,v^''^) of ([6]) whose components 
f E {0,1, ... ,ti}'^^ are weakly increasing. Hence A is a d x n- matrix with 

The model Ai and its mixtures are subsets of a subsimplex An^i of A„_i. 

We now introduce marginal likelihood integrals. All our domains of inte- 
gration in this paper are polytopes that are products of standard probability 
simplices. On each such polytope we fix the standard Lebesgue probability 
measure. In other words, our discussion of Bayesian inference refers to the 
uniform prior on each parameter space. Naturally, other prior distributions, 
such as Dirichlet priors, are of interest, and our methods are extended to 
these in Section 5. In what follows, we simply work with uniform priors. 

We identify the state space dHD with the set {1, . . . , n}. A data vector 
U = {Ui, . . . , Un) is thus an element of N". The sample size of these data is 
U1 + U2 + ■ — \-Un = N. If the sample size N is fixed then the probability of 
observing these data is 

This expression is a function on the polytope P which is known as the like- 
lihood function of the data U with respect to the independence model A4. 
The marginal likelihood of the data U with respect to the model Ai equals 

j^'i^uio) de. 
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The value of this integral is a rational number which we now compute explic- 
itly. The data U will enter this calculation by way of the sufficient statistic 
b = A-U, which is a vector in N'^. The coordinates of this vector are denoted 
b^j^ for i = 1, . . . , k and j = 0, . . . , t^. Thus b^j^ is the total number of times 

the value j is attained by one of the random variables xj*"*, . . . in the 
i-th group. Clearly, the sufficient statistics satisfy 

b^^ + b? + ■■■ + foSf = Si-N for all z = 1, 2, ... , k. (11) 

The likelihood function L[/(6') is the constant , times the monomial 

i=l j=0 



The logarithm of this function is concave on the polytope P, and its maxi- 
mum value is attained at the point 6 with coordinates Oj"^ = b^f /{si ■ N). 

Lemma 2.2. The integral of the monomial 6'' over the polytope P equals 

k 



! e^de = f[ 
jp ,-1 



t-\b^'hb^^h . . . h^h 



{SiN + t,)\ 

The product of this number with the multinomial coefficient N\/{Ui \ ■ ■ ■ f/„!) 
equals the marginal likelihood of the data U for the independence model Ai. 

Proof. Since P is the product of simplices ([7j), this follows from the formula 

A, {bo + bi + --- + bt + ty. ^ ^ 

for the integral of a monomial over the standard probability simplex A^. □ 

Our objective is to compute marginal likelihood integrals for the mixture 
model Ai^'^\ The natural parameter space of this model is the polytope 

e = Ai X P X P. 

Let Ot, G N'^ be the column vector of A indexed by the state v, which is either 
in ([6]) or in {1,2, ... ,n}. The parametrization ([8]) can be written simply as 
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Pt, = 6°"^. The mixture model A4^'^^ is defined to be the subset of A„_i with 
the parametric representation 

= (To -9^- + ai-p'^" for {(r,9,p) e 9. (13) 

The hkehhood function of a data vector f/ G N" for the model Ai^"^^ equals 

Luia,e,p) = f p,ia,e,pf^ ■■■p„ia,e,pf". (14) 

Ui\U2i ■ ■ ■ Un- 

The marginal likelihood of the data U with respect to the model Ai^"^^ equals 

/ Lu{a,e,p)dadedp = , / YlM"^ +^ip"^f"dadedp. (15) 

Je f^i' ■■ - Un- Je \ 

The following proposition shows that we can evaluate this integral exactly. 

Proposition 2.3. The marginal likelihood ( 173]) is a rational number. 

Proof. The likelihood function Lu is a Q>o-linear combination of monomials 
^aQbpC^ The integral (|T5l) is the same Q>o-linear combination of the numbers 

f a^e^p^dadOdp = ( / a^da) ■ ( / 9^d9) ■ ( [ p^dp). 
Je JAi Jp Jp 

Each of the three factors is an easy-to-evaluate rational number, by ( |T2l) . □ 

Example 2.4. The integral ([1]) expresses the marginal likelihood of a 4 x 4- 
table of counts U = (Uij) with respect to the mixture model A4^'^\ Specifi- 
cally, the marginal likelihood of the data ([2]) equals the normalizing constant 
40! ■ (2!)-^2 . (4i)-4 ^-j^gg ^Yie number The model M^^^ consists of all 
non-negative 4 x 4-matrices of rank < 2 whose entries sum to one. Here 
the parametrization (fT3|) is not identifiable because dim(7W^^)) = 11 but 
dim(6) = 13. In this example, k = 2, si=S2=l, ti=t2=3, d = 8, n = 16. □ 

In algebraic geometry, the model Ai^'^'^ is known as the first secant variety 
of the Segre- Veronese variety ([H]). We could also consider the higher secant 
varieties Ai^''\ which correspond to mixtures of / independent distributions, 
and much of our analysis can be extended to that case, but for simplicity we 
restrict ourselves to Z = 2. The variety J^^"^^ is embedded in the projective 
space P"~i with n as in ffTOl) . Note that n can be much smaller than n. If this 
is the case then it is convenient to aggregate states whose probabilities are 
identical and to represent the data by a vector f/ G N". Here is an example. 
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Example 2.5. Let k=l, Si=4 and ti=l, so Ai is the independence model 
for four identically distributed binary random variables. Then d = 2 and 
n = 16. The corresponding integer matrix and its row and column labels are 



A 



POOOO PooOl POOlO PolOO Piooo POOll ■ ■ ■ PlllO Piiii 

4 3 3 3 3 2 ••• 1 
1 1 1 1 2 ••• 3 4 



However, this matrix has only n = 5 distinct columns, and we instead use 



A 



01 



PO Pi P2 P3 P4 

4 3 2 1 
12 3 4 



The mixture model Ai^'^^ is the subset of A4 given by the parametrization 
= Q • {cro ■ ■ 0{ + ai ■ pt' ■ pi) for 2 = 0, 1, 2, 3, 4. 



Pi 



In algebraic geometry, this threefold is the secant variety of the rational 
normal curve in P^. This is the cubic hypersurface with the implicit equation 



det 



12po 3pi 2p2 
3pi 2p2 3p3 
2p2 3ps 12pi 



0. 



In [TTl Example 9] the likelihood function f[T^ was studied for the data vector 

U = iUo,Ui,U2,U3,U^) = (51,18,73,25,75). 

It has three local maxima (modulo swapping 6 and p) whose coordinates are 
algebraic numbers of degree 12. Using the methods to be described in the 
next two sections, we computed the exact value of the marginal likelihood for 
the data U with respect to Ai^'^\ The rational number fll5p is found to be the 
ratio of two relatively prime integers having 530 digits and 552 digits, and its 
numerical value is approximately 0.7788716338838678611335742 ■ lO'^^. □ 



3 Summation over a Zonotope 

Our starting point is the observation that the Newton polytope of the like- 
lihood function f[T^ is a zonotope. Recall that the Newton polytope of a 
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polynomial is the convex hull of all exponent vectors appearing in the ex- 
pansion of that polynomial, and a polytope is a zonotope if it is the image 
of a standard cube under a linear map. See [H §7] and JTT, §7] for further 
discussions. We are here considering the zonotope 

n 

Za{U) = [0,a,], 

v=l 

where [0, a„] represents the line segment between the origin and the point 
ay G M*^, and the sum is a Minkowski sum of line segments. We write 
Za = ^^(1, 1, • • • 5 1) for the basic zonotope which is spanned by the vectors 
a„. Hence Z^iU) is obtained by stretching Za along those vectors by factors 
Uy respectively. Assuming that the counts are all positive, we have 

dim(ZA(f/)) = dim(ZA) = rank(A) = d-k + l. (16) 

The zonotope Za is related to the polytope P = conv(74) in ([7]) as follows. 
The dimension d — k = ti + ■ ■ ■ + tk of P is one less than dim(ZA), and P 
appears as the vertex figure of the zonotope Za at the distinguished vertex 0. 

Remark 3.1. For higher mixtures Ai^''\ the Newton polytope of the like- 
lihood function is isomorphic to the Minkowski sum of (/ — l)-dimensional 
simplices in R^'"^)'^. Only when I = 2, this Minkowski sum is a zonotope. □ 



The marginal likelihood ([T5|l we wish to compute is the integral 

« n 

/ JJicToe"'^ + cTip^^f^dadedp (17) 

times the constant N\/{Ui\ ■ ■ - UJ.)- Our approach to this computation is 
to sum over the lattice points in the zonotope Za{U). If the matrix A has 
repeated columns, we may replace A with the reduced matrix A and U 
with the corresponding reduced data vector U. If one desires the marginal 
likelihood for the reduced data vector U instead of the original data vector 
U, the integral remains the same while the normalizing constant becomes 

■ (Ti ■ ■ ■ (T- 



where a, is the number of columns in A equal to the i-th column of A. In 
what follows we ignore the normalizing constant and focus on computing the 
integral f[T7l) with respect to the original matrix A. 
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For a vector b G M>q we let |6| denote its L^-norm Ylt=i R-^call from 
that all columns of the d x n-matrix A have the same coordinate sum 



Si + S2 + ■ ■ ■ + Sk, for all f = 1, 2, . . . , n, 



and from flTT]) that we may denote the entries of a vector 6 G M*^ by 6^*^ for 



i = 1, . . . , k and j = 0, . . . ,1^. Also, let L denote the image of the linear map 
A : Z" — i> Z'^. Thus L is a sublattice of rank c? — /c + 1 in Z"^. We abbreviate 
Z\{U) := Za{U) n L. Now, using the binomial theorem, we have 

x„=0 ^^^^ 

Therefore, in the expansion of the integrand in (fT7|) . the exponents of ^ are 
of the form of 6 = '^^x^a^ G Z\{U), < Xy < Uy. The other exponents 
may be expressed in terms of h. This gives us 

n 

Writing D(?7) = {(xi, . . . , x„) G Z** : < Xt, < f4, f = 1, . . . , n}, we can 
see that the coefficient in f|T8|) equals 

'/>A(^t^) = E n(^i- (19) 

Ax=b v=l ^ 

xeD(i/) 

Thus, by formulas (fT2|) and (fT8|) . the integral (fT7|) evaluates to 

^ '^^^^'^^^ (|f/| + l)! 'U (|c»l+t.l' '-^ ^^ 



c=At/-b 



We summarize the result of this derivation in the following theorem. 

Theorem 3.2. The marginal likelihood of the data U in the mixture model 
Ai^'^^ is equal to the sum ^2^) times the normalizing constant N\ / (Ui\ ■ ■ ■ UJ.)- 
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Each individual summand in the formula fl2UI) is a ratio of factorials and 
hence can be evaluated symbolically. The challenge in turning Theorem 13.21 
into a practical algorithm lies in the fact that both of the sums (1191) and 
fl20|) are over very large sets. We shall discuss these challenges and present 
techniques from both computer science and mathematics for addressing them. 

We first turn our attention to the coefficients 0^(6, f/) of the expansion 
f[TS]l . These quantities are written as an explicit sum in flTIJl) . The first useful 
observation is that these coefficients are also the coefficients of the expansion 

Y^(^0-^ + lf. ^ J2Mb,U)-9\ (21) 

bez\{u) 

which comes from substituting (Xj = 1 and pj = 1 in f|T8l) . When the cardi- 
nality of Z\{U) is sufficiently small, the quantity 0yi(&, U) can be computed 
quickly by expanding fl^ using a computer algebra system. We used Maple 
for this purpose and all other symbolic computations in this project. 

If the expansion (12T|) is not feasible, then it is tempting to compute the 
individual 0^(6, U) via the sum-product formula (fT9|) . This method requires 
summation over the set {x G D(f/) : Ax = b}, which is the set of lattice 
points in an {n — d + k — l)-dimensional polytope. Even if this loop can be 
implemented, performing the sum in (1191) symbolically requires the evaluation 
of many large binomials, which causes the process to be rather inefficient. 

An alternative is offered by the following recurrence formula: 

Mb,U) = ^(^A(j)Awib-Xnan,U\Ur,). (22) 
This is equivalent to writing the integrand in ([T7|l as 



More generally, for each < i < n, we have the recurrence 

Mb,U)= Yl <pA'{b',U')-<f)A\A'{b-b',U\U'), 
b'ez\,{U') 

where A' and U' consist of the first i columns and entries of A and U respec- 
tively. This corresponds to the factorization 
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This formula gives flexibility in designing algorithms with different payoffs 
in time and space complexity, to be discussed in Section 4. 

The next result records useful facts about the quantities (pA^b, U). 

Proposition 3.3. Suppose h G 7}^{U) and c = AU — b. Then, the following 
quantities are all equal to 4>A{b, U): 

(1) i^{z G {0, 1}^ : A'^z = 6}, where A^ is the extended matrix 

A^ : = ( Qi, • J , Ql , Q2; • — ; ^2 ^; • • • ; ^n; • — ; Q-n) ; 

(3) 

^ n(":)' 

Ax=b v=l ^ 

where Uj = min {Uj}U{b,n/ajm}'!n=i (^i^'d h = Uj-min {Uj}U{cm/ajm}'^=i ■ 
Proof. (1) This follows directly from fl2Tl) . 

(2) For each z G {0, 1}^ satisfying A^ z = b, note that z = (1, 1, . . . , 1) — z 
satisfies A'^z = c, and vice versa. The conclusion thus follows from (1). 

(3) We require Ax = b and x G D([/). Uxj > Uj = bm/ajm then ajmXj > b^, 
which implies Ax ^ b. The lower bound is derived by a similar argument. □ 

One aspect of our approach is the decision, for any given model A and 
data set f/, whether or not to attempt the expansion fl^ using computer 
algebra. This decision depends on the cardinality of the set Z\{U). In what 
follows, we compute the number exactly when A is unimodular. When A is 
not unimodular, we obtain useful lower and upper bounds for ^Z\{U). 

Let S be any subset of the columns of A. We call S independent if its 
elements are linearly independent in M."^. With S we associate the integer 

index(5) := [MSnL:Z5]. 

This is the index of the abelian group generated by S inside the possibly 
larger abelian group of all lattice points in L = ZA that lie in the span of S. 
The following formula is due to R. Stanley and appears in [TOl Theorem 2.2]: 

Proposition 3.4. The number of lattice points in the zonotope Za{U) equals 
4fZ\iU) = Yl index(5) ■ n U.- (23) 

SCA indep. a„g5 
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In fact, the number of monomials in (ITS]) equals ^Ma{U), where M^(f/) 
is the set {b G Z\{U) : (pAip,!!) ^ 0}, and this set can be different from 
Z\{U). For that number we have the following upper and lower bounds. The 
proof of Theorem 13.51 will be omitted here. It uses the methods in [lOl §2] . 

Theorem 3.5. The number j^Mji{U) of monomials in the expansion l[T8\) 
of the likelihood function to be integrated satisfies the two inequalities 

J2 < #Ma{U) < Yl index(5)-nt^- (24) 

SCA indcp. veS SCA indcp. vGS 

By definition, the matrix A is unimodular if index(5') = 1 for all indepen- 
dent subsets 5* of the columns of A. In this case, the upper bound coincides 
with the lower bound, and so Ma{U) = Z\{U). This happens in the classical 
case of two-dimensional contingency tables {k = 2 and si = S2 = 1). In gen- 
eral, ^Z\{U) /^Ma_{U) tends to 1 when all coordinates of U tend to infinity. 
This is why we beheve that ^Z\{U) is a good approximation of ^Ma{U). 
For computational purposes, it suffices to know ^Za(U). 

Remark 3.6. There exist integer matrices A for which ^Ma{U) does not 
agree with the upper bound in Theorem 13. 5[ However, we conjecture that 
^Ma{U) = 4j^Z\{U) holds for matrices A of Segre- Veronese type as in ([H]) 
and strictly positive data vectors U . □ 

Example 3.7. Consider the 100 Swiss Francs example in Section [H Here A 
is unimodular and it has 16145 independent subsets S. The corresponding 
sum of 16145 squarefree monomials in (123!) gives the number of terms in the 
expansion of (jl]). For the data ?7 in ([2]) this sum evaluates to 3, 892, 097. □ 

Example 3.8. We consider the matrix and data from Example 12.51 

/O 1 2 3 4\ 
1^4 3 2 1 Oy) 

(51,18,73,25,75) 

By Theorem 13.51 the lower bound is 22,273 and the upper bound is 48,646. 
Here the number jj^M^{U) of monomials agrees with the latter. □ 

We next present a formula for index (5) when S is any linearly indepen- 
dent subset of the columns of the matrix A. After relabeling we may assume 
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that S = {oi, . . . ,afc} consists of the first k columns of A. Let H = VA 
denote the row Hermite normal form of A. Here V G SLa{1j) and H satisfies 



Hij = for i > j and < Hij < Hjj for i < j. 

Hermite normal form is a built-in function in computer algebra systems. For 
instance, in Maple the command is ihermite. Using the invertible matrix 
V, we may replace A with H, so that M.S becomes M.^ and ZS is the image 
over Z of the upper left k x fc-submatrix of H. We seek the index of that 
lattice in the possibly larger lattice ZA fl Z'^. To this end we compute the 
column Hermite normal form H' = HV. Here V G S'L„(Z) and H' satisfies 

H-j = if i>j or j>d and < Hij < Ha for i < j. 

The lattice ZA fl Z'^ is spanned by the first k columns of H', and this implies 

mdex(^) = — — — . 

^11^22 ■■■ ^kk 



4 Algorithms 

In this section we discuss algorithms for computing the integral ( fTTI) exactly, 
and we discuss their advantages and limitations. In particular, we examine 
four main techniques which represent the formulas (!20l) . (12T|) . (fT6!) and (122|) 
respectively. The practical performance of the various algorithms is compared 
by computing the integral in Example 12. 5[ 

A Maple library which implements our algorithms is made available at 

http : / /math . berkeley . edu/~shaowei/ integrals . html 

The input for our Maple code consists of parameter vectors s = (si, . . . , s^) 
and t = {ti, . . . as well as a data vector U G N". This input uniquely 
specifies the d x n-matrix A. Here d and n are as in ([5]). The output features 
the matrices A and A, the marginal likelihood integrals for A4 and A4^'^\ as 
well as the bounds in ( l24l) . 

We tacitly assume that A has been replaced with the reduced matrix A. 
Thus from now on we assume that A has no repeated columns. This requires 
some care concerning the normalizing constants. All columns of the matrix 
A have the same coordinate sum a, and the convex hull of the columns is 
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the polytope P = Aj ^ x At^ x ■ ■ ■ x A^^. . Our domain of integration is the 
following polytope of dimension 2d — 2k + 1: 

e = Ai X P X P. 

We seek to compute the rational number 

« n 

/ YlicToO"^ +aip''^f^dadedp, (25) 

v=l 

where integration is with respect to Lebesgue probability measure. Our 
Maple code outputs this integral multiplied with the statistically correct 
normalizing constant. That constant will be ignored in what follows. In 
our complexity analysis, we fix A while allowing the data U to vary. The 
complexities will be given in terms of the sample size N = Ui + ■ ■ ■ + Un- 



4.1 Ignorance is Costly 

Given an integration problem such as fl25l) . a first attempt is to use the sym- 
bolic integration capabilities of a computer algebra package such as Maple. 
We will refer to this method as ignorant integration: 

U := [51, 18, 73, 25, 75] : 

f := (s*t~4 +(l-s)*p"4 )"U[1] * 

(s*t~3*(l-t) +(l-s)*p"3*(l-p) )"U[2] * 

(s*t-2*(l-t)-2+(l-s)*p"2*(l-p)-2)"U[3] * 

(s*t *(l-t)~3+(l-s)*p *(l-p)~3)"U[4] * 

(s *(l-t)~4+(l-s) *(l-p)"4)"U[5] : 
11 := int(int(int(f ,p=0. . 1) ,t=0. . 1) ,s=0. . 1) ; 

In the case of mixture models, recognizing the integral as the sum of 
integrals of monomials over a polytope allows us to avoid the expensive in- 
tegration step above by using (l20l) . To demonstrate the power of using (120|) . 
we implemented a simple algorithm that computes each 0a(&, U) using the 
naive expansion in (IT^ . We computed the integral in Example 12.51 with a 
small data vector U = (2, 2, 2, 2, 2), which is the rational number 

66364720654753 
59057383987217015339940000' 

and summarize the run-times and memory usages of the two algorithms in 
the table below. All experiments reported in this section are done in Maple. 
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Time(seconds) 


Memory(bytes) 


Ignorant Integration 


16.331 


155,947,120 


Naive Expansion 


0.007 


458,668 



For the remaining comparisons in this section, we no longer consider the 
ignorant integration algorithm because it is computationally too expensive. 

4.2 Symbolic Expansion of the Integrand 

While ignorant use of a computer algebra system is unsuitable for computing 
our integrals, we can still exploit its powerful polynomial expansion capabil- 
ities to find the coefficients of ( 12T|) . A major advantage is that it is very easy 
to write code for this method. We compare the performance of this symbolic 
expansion algorithm against that of the naive expansion algorithm. The ta- 
ble below concerns computing the coefficients (pAib, U) for the original data 
U = (51,18,73,25,75). The column "Extract" refers to the time taken to 
extract the coefficients 4>A{b, U) from the expansion of the polynomial, while 
the column "Sum" shows the time taken to evaluate (I2U]) after all the needed 
values of 0a(&, U) had been computed and extracted. 







Time (seconds) 




Memory 






Extract Sum 


Total 


(bytes) 


Naive Expansion 


2764.35 


31.19 


2795.54 


10,287,268 


Symbolic Expansion 


28.73 


962.86 29.44 


1021.03 


66,965,528 



4.3 Storage and Evaluation of 0^(6, [/) 

Symbolic expansion is fast for computing (j)A{b, U), but it has two drawbacks: 
high memory consumption and the long time it takes to extract the values of 
4>A{b, U). One solution is to create specialized data structures and algorithms 
for expanding (12T|) . rather using than those offered by Maple. 

First, we tackle the problem of storing the coefficients (f)A{b,U) for b G 
Z\{U) C R'^ as they are being computed. One naive method is to use a 
ci-dimensional array However, noting that A is not row rank full, we can 
use a do-dimensional array to store 4>A{b, U), where = rank(A) = d — k + 1. 
Furthermore, by Proposition 13.3( 2). the expanded integrand is a symmetric 
polynomial, so only half the coefficients need to be stored. We will leave out 
the implementation details so as not to complicate our discussions. In our 
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algorithms, we will assume that the coefficients are stored in a (io-dimensional 
array and the entry that represents 4>A{b, U) will be referred to as 

Next, we discuss how (pAip, U) can be computed. One could use the naive 
expansion f|T9|) . but this involves evaluating many binomials coefficients and 
products, so the algorithm is inefficient for data vectors with large coordi- 
nates. A much more efficient solution uses the recurrence formula 

Algorithm 4.1 (RECURRENCE(A, U)). 
Input: The matrix A and the vector U . 
Output: The coefficients (j)A{b,U). 
Step 1: Create a do-dimensional array of zeros. 
Step 2: For each x E {0, 1, ... , Ui} set 



(f)[xai] 



X 



Step 3: Create a new (io-dimensional array 0'. 
Step 4: For each 2 < j < n do 

1. Set all the entries of 0' to 0. 

2. For each x E {0, 1, ... , Uj} do 

For each non-zero entry 0[6] in do 
Increment (j)'[b + xaj] by (^^)0[6]. 

3. Replace with 0'. 
Step 5: Output the array 0. 

The space complexity of this algorithm is 0{N'^°) and its time complex- 
ity is 0{N'^°~^^). By comparison, the naive expansion algorithm has space 
complexity 0{N'^) and time complexity 0{N"'^^). 

We now turn our attention to computing the integral fl25l) . One major 
issue is the lack of memory to store all the terms of the expansion of the 
integrand. We overcome this problem by writing the integrand as a product of 
smaller factors which can be expanded separately. In particular, we partition 
the columns of A into submatrices A^^\ . . . , A^"^'^ and let U^^\ . . . , t/t™] be the 
corresponding partition of U. Thus the integrand becomes 

m 

j = l V 

where ai^'^ is the fth column in the matrix A^^\ The resulting algorithm for 
evaluating the integral is as follows: 
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Algorithm 4.2 (Fast Integral). 

Input: The matrices A^^\ . . . , A^"^\ vectors U^^\ . . . , f/t™! and the vector t. 
Output: The value of the integral fl25|) in exact rational arithmetic. 
Step 1: For 1 < j < m, compute := RECURRENCE(A[j], [/[^l). 
Step 2: Set / := 0. 

Step 3: For each non-zero entry in ^t^l do 

For each non-zero entry (;/)H[5M] in 

Set b := feW + ■ ■ ■ + c := AU - b, (j) := n7=i (l)^'^[b^. 
Increment / by 

, {\b\/ay.{\c\/ay. prfc U'- b^o^\-b['^ \ U\c^^\-<^1 

V {\U\ + 1)\ llj = l {\b(i)\+U)\ {\c(i)\+U)\ ■ 

Step 4: Output the sum I. 

The algorithm can be sped up by precomputing the factorials used in the 
product in Step 3. The space and time complexity of this algorithm is 0{N^) 
and 0{N^) respectively, where S = max^rankAW and T = ^^rankA^. 
From this, we see that the splitting of the integrand should be chosen wisely 
to achieve a good pay-off between the two complexities. 

In the table below, we compare the naive expansion algorithm and the 
fast integral algorithm for the data U = (51, 18, 73, 25, 75). We also compare 
the effect of splitting the integrand into two factors, as denoted by m = 1 
and m = 2. For m = 1, the fast integral algorithm takes significantly less 
time than naive expansion, and requires only about 1.5 times more memory. 





Time (minutes) 


Memory(bytes) 


Naive Expansion 


43.67 


9,173,360 


Fast Integral (m=l) 


1.76 


13,497,944 


Fast Integral (m=2) 


139.47 


6,355,828 



4.4 Limitations and Applications 

While our algorithms are optimized for exact evaluation of integrals for mix- 
tures of independence models, they may not be practical for applications 
involving large sample sizes. To demonstrate their limitations, we vary the 
sample sizes in Example 12.51 and compare the computation times. The data 
vectors U are generated by scaling U = (51,18,73,25,75) according to the 
sample size N and rounding off the entries. Here, is varied from 110 to 300 
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Figure 1: Comparison of computation time against sample size. 

2-5 T , , , , , , , , , 1 




H — \ — \ — \ — I — \ — \ — \ 

2 2.05 2.1 2.15 2.2 2.25 2.3 2.35 2.4 2.45 2.5 

log(N) 

by increments of 10. Figure [1] shows a logarithmic plot of the results. The 
times taken for = 110 and = 300 are 3.3 and 98.2 seconds respectively. 
Computation times for larger samples may be extrapolated from the graph. 
Indeed, a sample size of 5000 could take more than 13 days. 

For other models, such as the 100 Swiss Francs example in Section 1 and 
that of the schizophrenic patients in Example 15. 5^ the limitations are even 
more apparent. In the table below, for each example we list the sample size, 
computation time, rank of the corresponding A-matrix and the number of 
terms in the expansion of the integrand. Despite having smaller sample sizes, 
the computations for the latter two examples take a lot more time. This may 
be attributed to the higher ranks of the A-matrices and the larger number 
of terms that need to be summed up in our algorithm. 





Size 


Time 


Rank 


# Terms 


Coin Toss 


242 


45 sec 


2 


48,646 


100 Swiss Francs 


40 


15 hrs 


7 


3,892,097 


Schizophrenic Patients 


132 


16 days 


5 


34,177,836 



Despite their high complexities, we believe our algorithms are important 
because they provide a gold standard with which approximation methods 
such as those studied in [T] can be compared. Below, we use our exact meth- 
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ods to ascertain the accuracy of asymptotic formula derived by Watanabe et 
al. using desingularization methods from algebraic geometry IT5| [TB]. 



Example 4.3. Consider the model from Example 12.51 Choose data vectors 
U = {Uq, Ui, U2, U3, t/4) with Ui = Nqi where is a multiple of 16 and 

Let In{U) be the integral (!25|) . Define 

4 

Fm{U) = NY,1^^ogqi-\ogIN{U). 

i=0 

According to [16], for large we have the asymptotics 

MMU)] = ^logAr + 0(l) (26) 

where the expectation Ejj is taken over all U with sample size A^ under the 
distribution defined by g = (go, ^i, i?2, 93, i?4)- Thus, we should expect 

Fie+AT-F^ ^ ^\ogilQ + N)-^\ogN=:g{N). 

We compute -Fie+Af — Fn using our exact methods and list the results below. 



A^ 


FiQ+N — Fn 


9iN) 


16 


0.21027043 


0.225772497 


32 


0.12553837 


0.132068444 


48 


0.08977938 


0.093704053 


64 


0.06993586 


0.072682510 


80 


0.05729553 


0.059385934 


96 


0.04853292 


0.050210092 


112 


0.04209916 


0.043493960 



Clearly, the table supports our conclusion. The coefficient 3/4 of logA^ in 
the formula fl26l) is known as the real log- canonical threshold of the statistical 
model. The example suggests that our method could be developed into a 
numerical technique for computing the real log-canonical threshold. □ 
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5 Back to Bayesian statistics 



In this section we discuss how the exact integration approach presented here 
interfaces with issues in Bayesian statistics. The first concerns the rather 
restrictive assumption that our marginal hkehhood integral be evaluated with 
respect to the uniform distribution (Lesbegue measure) on the parameter 
space O. It is standard practice to compute such integrals with respect 
to Dirichlet priors, and we shall now explain how our algorithms can be 
extended to Dirichlet priors. That extension is also available as a feature in 
our Maple implementation. 

Recall that the Dirichlet distribution Dir(a) is a continuous probability 
distribution which is parametrized by a vector a = (ao, ai, . . . , a™,) of pos- 
itive reals. It is the multivariate generalization of the beta distribution and 
is conjugate prior (in the Bayesian sense) to the multinomial distribution. 
This means that the probability distribution function of Dir(Q;) specifies the 
belief that the probability of the ith among m + 1 events equals 6i given that 
it has been observed — 1 times. More precisely, the probability density 
function f{6] a) of Dir(Q;) is supported on the m-dimensional simplex 

Am = {(^0, • • • 5 ^m) G IR>Q : ^0 + h 6'm = l}, 

and it equals 



3a— 1 



[a) i!&[a) 
Here the normalizing constant is the multinomial beta function 

m!r(ao)r(ai) ■ ■ ■ r(a^ 



[a] 



r(ao + tti H h am) 

Note that, if the ctj are all integers, then this is the rational number 

m!(ao-l)!(ai-l)!---(am-l)! 



[a 



(ao H \-arn-iy. 



Thus the identity f|T2|) is the special case of the identity f{6; a)d9 = 1 
for the density of the Dirichlet distribution when all a, = 6, + 1 are integers. 

We now return to the marginal likelihood for mixtures of independence 
models. To compute this quantity with respect to Dirichlet priors means 
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the following. We fix positive real numbers ao,ai, and (3^^^ and 7^*'' for 
i = 1, . . . ,k and j = 0, . . . , tj. These specify Dirichlet distributions on Ai, 
P and P. Namely, the Dirichlet distribution on P given by the /^j'^ is the 
product probability measure given by taking the Dirichlet distribution with 
parameters {(3q\ f]i \ . . . , Ptl^) on the i-th factor At- in the product ([7]) and 
similarly for the 7^*''. The resulting product probability distribution on B = 
Ai X P X P is called the Dirichlet distribution with parameters (a,/5,7). 
Its probability density function is the product of the respective densities: 

By the marginal likelihood with Dirichlet priors we mean the integral 

/ Liu{a,9, p) f{a,6, p;a, (3,'y)dad9dp. (28) 
Je 

This is a modification of (fT5l) and it depends not just on the data U and the 
model Ai^"^^ but also on the choice of Dirichlet parameters (a,/3, 7). When 
the coordinates of these parameters are arbitrary positive reals but not in- 
tegers, then the value of the integral fl28l) is no longer a rational number. 
Nonetheless, it can be computed exactly as follows. We abbreviate the prod- 
uct of gamma functions in the denominator of the density (|27|) as follows: 

fc fc 
B(a,/?,7) := B(a) ■ JJ B(/?«) ■ JJ B(7(*)). 

1=1 i=l 

Instead of the integrand (fTSjl we now need to integrate 

|6|/a+Q()-l |c|/a+ai-l c+'y-l 

B(a,A7) ° ' 

c=AU-b 

with respect to Lebesgue probability measure on O. Doing this term by term, 
as before, we obtain the following modification of Theorem 13.21 

Corollary 5.1. The marginal likelihood of the data U in the mixture model 
TW*^^-* with respect to Dirichlet priors with parameters (a,/3, 7) equals 

N\ V . rh Jh TT\ r(|b|/a+«o)r(|c|/a+ai) 

t/i!-i7„!-B(a,/3,7) ' 2^b&Z\{U) (PA^O, U ) r{\U\ + \a\) 

c=AU-b 



{b,U) 

1 

r(\b(i)\+\i3(i)\) ' r(|c(^)n-|7(')|) 



. i,!r(bW+^W)...r(b(0+^W) t.!r(cW+7^^))...r(4f+7if) ^ 
ll*=iV r(\b(i)\+\0(i)\) rfic(^)n-h(')h )■ 
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A well-known experimental study by Chickering and Heckerman [T] com- 
pares different methods for computing numerical approximations of marginal 
likelihood integrals. The model considered in [Ij is the naive-Bayes model, 
which, in the language of algebraic geometry, corresponds to arbitrary se- 
cant varieties of Segre varieties. In this paper we considered the first secant 
variety of arbitrary Segre- Veronese varieties. In what follows we restrict our 
discussion to the intersection of both classes of models, namely, to the first 
secant variety of Segre varieties. For the remainder of this section we fix 

Si = S2 = ■ ■ ■ = Sk = I 

but we allow ti,t2, ■ ■ ■ ,tk to be arbitrary positive integers. Thus in the model 
of [H Equation (1)], we fix vq = 2, and the n there corresponds to our k. 

To keep things as simple as possible, we shall fix the uniform distribution 
as in Sections 1-4 above. Thus, in the notation of [H §2], all Dirichlet hy- 
perparameters aijk are set to 1. This implies that, for any data U G N"' and 
any of our models, the problem of finding the maximum a posteriori (MAP) 
configuration is equivalent to finding the maximum likelihood (ML) configu- 
ration. To be precise, the MAP configuration is the point {a, 6, p) in O which 
maximizes the likelihood function Iju{a,6,p) in f|T^ . This maximum may 
not be unique, and there will typically be many local maxima. Chickering 
and Heckerman [T], §3.2] use the expectation maximization (EM) algorithm 
fr]\ §1.3] to compute a numerical approximation of the MAP configuration. 

The Laplace approximation and the BIC score [H §3.1] are predicated 
on the idea that the MAP configuration can be found with high accuracy 
and that the data U were actually drawn from the corresponding distribu- 
tion p{a, 9, p). Let H(cr, 0, p) denote the Hessian matrix of the log-likelihood 
function logL((T, 6*, p). Then the Laplace approximation [H equation (15)] 
states that the logarithm of the marginal likelihood can be approximated by 

1 2d — 2k -\- ^ 
logL(a,0,p) - -log|detH(a,^,/))| + _J_log(27r). (29) 

The Bayesian information criterion (BIC) suggests the coarser approximation 

9(7 — 9 A- -I- 1 

logL(a,^,p) - f ^ log(Ar), (30) 

where = f/i + ■ ■ ■ + f/„ is the sample size. 

In algebraic statistics, we do not content ourselves with the output of 
the EM algorithm but, to the extent possible, we seek to actually solve the 
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likelihood equations [TT] and compute all local maxima of the likelihood 
function. We consider it a difficult problem to reliably find {cr,6,p), and we 
are concerned about the accuracy of any approximation like fl29l) or fl30l) . 

Example 5.2. Consider the 100 Swiss Francs table ([2]) discussed in the 
Introduction. Here = 2, Si = S2 = 1, ti = ^2 = 3, the matrix A is 
unimodular, and (Q is the Segre embedding x P^^. The parameter 

space is 13-dimensional, but the model Ai^"^^ is 11-dimensional, so the given 
parametrization is not identifiable [6]. This means that the Hessian matrix 
H is singular, and hence the Laplace approximation fl2^ is not defined. □ 

Example 5.3. We compute fl29l) and fl30l) for the model and data in Example 
12. 5[ According to [HI Example 9], the likelihood function Po^Pi^pppl^pp has 
three local maxima (^0,^1,^2,^3,^4) in the model J^^^\ and these translate 
into six local maxima {a, 6, p) in the parameter space G, which is the 3-cube. 
The two global maxima (ao,^o,Po) in © are 

(0.3367691969, 0.0287713237, 0.6536073424), 
(0.6632308031, 0.6536073424, 0.0287713237). 

Both of these points in give the same point in the model: 

{po,Pi,P2,P3,P4) = (0.12104,0.25662,0.20556,0.10758,0.30920). 

The likelihood function evaluates to 0.1395471101 x 10^^^ at this point. The 
following table compares the various approximations. Here, "Actual" refers 
to the base-10 logarithm of the marginal likelihood in Example 12. 5[ 



BIG 


-22.43100220 


Laplace 


-22.39666281 


Actual 


-22.10853411 



The method for computing the marginal likelihood which was found to 
be most accurate in the experiments of Chickering and Heckerman is the 
candidate method [T| §3.4]. This is a Monte-Carlo method which involves 
running a Gibbs sampler. The basic idea is that one wishes to compute a 
large sum, such as (1201) by sampling among the terms rather than listing all 
terms. In the candidate method one uses not the sum fl20l) over the lattice 
points in the zonotope but the more naive sum over all 2^ hidden data that 
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would result in the observed data represented by U. The value of the sum is 
the number of terms, 2^, times the average of the summands, each of which 
is easy to compute. A comparison of the results of the candidate method 
with our exact computations, as well as a more accurate version of Gibbs 
sampling which is adapted for fl20l) . will be the subject of a future study. 

One of the applications of marginal likelihood integrals lies in model se- 
lection. An important concept in that field is that of Bayes factors. Given 
data and two competing models, the Bayes factor is the ratio of the marginal 
likelihood integral of the first model over the marginal likelihood integral 
of the second model. In our context it makes sense to form that ratio for 
the independence model Ai and its mixture Ai^'^\ To be precise, given any 
independence model, specified by positive integers si, . . . , Sk,ti, . . . ,tk and 
a corresponding data vector U G N", the Bayes factor is the ratio of the 
marginal likelihood in Lemma 12.21 and the marginal likelihood in Theorem 
13. 2[ Both quantities are rational numbers and hence so is their ratio. 

Corollary 5.4. The Bayes factor which discriminates between the indepen- 
dence model Ai and the mixture model Ai^'^^ is a rational number. It can be 
computed exactly using Algorithm \4.S\ (and our Ma-ple-implementation) . 

Example 5.5. We conclude by applying our method to a data set taken 
from the Bayesian statistics literature. Evans, Gilula and Guttman [51 §3] 
analyzed the association between length of hospital stay (in years Y) of 132 
schizophrenic patients and the frequency with which they are visited by their 
relatives. Their data set is the following contingency table of format 3x3: 





2<r<10 


io<r<2o 


20<Y 


Totals 


Visited regularly 


43 


16 


3 


62 


Visited rarely 


6 


11 


10 


27 


Visited never 


9 


18 


16 


43 


Totals 


58 


45 


29 


132 



They present estimated posterior means and variances for these data, where 
"eac/i estimate requires a 9-dimensional integration" f5^, p. 561]. Computing 
their integrals is essentially equivalent to ours, for /c = 2, si = S2 = 1, ti = 
^2 = 2 and = 132. The authors emphasize that "the dimensionality of 
the integral does present a problem" |5l p. 562], and they point out that "all 
posterior moments can be calculated in closed form .... however, even for 
modest N these expressions are far to complicated to be useful" p. 559]. 
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We differ on that conclusion. In our view, the closed form expressions in 
Section 3 are quite useful for modest sample size N. Using Algorithm \4.2\ 
we computed the integral fl25l) . It is the rational number with numerator 

278019488531063389120643600324989329103876140805 
285242839582092569357265886675322845874097528033 
99493069713103633199906939405711180837568853737 

and denominator 

12288402873591935400678094796599848745442833177572204 
50448819979286456995185542195946815073112429169997801 
33503900169921912167352239204153786645029153951176422 
43298328046163472261962028461650432024356339706541132 
34375318471880274818667657423749120000000000000000. 

To obtain the marginal likelihood for the data U above, that rational number 
(of moderate size) still needs to be multiplied with the normalizing constant 

132! 

43! ■ 16! ■ 3! ■ 6! ■ 11! ■ 10! ■ 9! ■ 18! • 16!' 
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